Intermittency as a universal characteristic of the complete chromosome DNA 
sequences of eukaryotes: From protozoa to human genomes 
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Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are con- 
sidered. By the proposed deterministic models with intermittency and symbolic dynamics we de- 
scribe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental du- 
plications, tandem repeats, and other complex sequence structures. It is shown that the recently 
discovered gene number balance on the strands is not of random nature, and a complete chromosome 
DNA sequence exhibits the properties of deterministic chaos. 
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The last decade witnessed outstanding discoveries of 
the structure of genomic sequences. New types of 
large-scale polymorphisms such as copy number variants 
(CNV), inversions, segmental duplications and gigantic 
palindromes [l[ were found and described as fundamen- 
tal features of DNA sequences. For human individual 
genomes these polymorphisms comprise a significant part 
of the complete DNA sequence, up to 12% for CNVs Jl 
and more than 5% for recent segmental duplications 
For each chromosome of human genome such a varia- 
tion comprises up to 30% of the sequence @ . The noted 
sequences have a crucial effect on the dynamics of the 
chromosome sequence evolution. 

At the same time, the nature of fundamental mecha- 
nisms of genome functioning, such as recombination and 
replication, results from a close interaction with the pro- 
cesses of regulation of gene expression and superspiral- 
ization Acting on the level of the entire genome 
these processes should have a large-scale effect on the 
DNA sequence composition and lead to GG and AT lo- 
cal asymmetry, but the contribution of each of those re- 
mains unknown Q. On the other hand, global composi- 
tional symmetry, as it is stated in 2nd ChargafFs parity 
rule [6j, has been demonstrated for chromosomes from 
bacteria to human. 

It is well known that duplications of different type 
are a common property of DNA sequence evolution [7|. 
Copy number variants, segmental duplications and gi- 
gantic palindromes compose a class of large-scale DNA 
duplications in eukaryotic chromosomes. How these poly- 
morphisms coexist with the processes that maintain the 
regularity of the most important genomic functions and 
with the global compositional symmetry of chromosomal 
DNA sequence, and how they are reflected in the com- 
plexity of sequence structure poses a question about ran- 
dom and deterministic properties of the complete chro- 
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mosome sequences. 

In our recent paper Q we examined, by the 2D DNA 
walk method, large-scale properties of genome sequences 
of 671 chromosomes of bacteria, archaea, fungi and hu- 
man. We found that via in silico gene sorting by strand 
position one can obtain a completely symmetrical form 
of 2D DNA walk of the coding sequences. It was shown 
that the number of genes on different strands as well as 
their total length and the cumulative GC-skew are ap- 
proximately equal to each other for the most eukaryotic 
chromosomes. As a result, the second ChargafFs parity 
rule, is just as valid for coding sequences and is totally 
defined by the gene strand positions. With a gene-vector 
model, as it has been obtained in [§[ , the majority of the 
genes incline to accumulate guanine (G) over cytosine 
(G) and adenine (A) over thymine (T). 

These results supplement the question about stochas- 
tic or deterministic nature of evolutionary processes that 
shape up the observable properties of chromosome se- 
quences, especially, the gene number balance on DNA 
strands for complete chromosomes. How can this balance 
exist without any restrictions in the presence of recom- 
binations and random segmental duplications? 

In this Letter we consider the the complete eukary- 
otic chromosome sequences and suggest simple determin- 
istic model which describes a wide spectrum of large-scale 
properties of real chromosome sequences across different 
taxa. These include gigantic pseudo-palindromes, sev- 
eral tandem repeats against a background of stochastic- 
like areas, and other large-scale patterns. The obtained 
results explicitly show that DNA sequences should not 
be random but possess a property of deterministic chaos. 

To analyze the generated chromosome sequences we 
apply the 2D DNA walk technique developed in 
The selection of A — T and G — C co-ordinate axes, re- 
spectively, is dictated by the com plem entarity of chains 
and the hydrogen bond balance [lOl li"Tl |. Using ideas 
of the symbolic dynamics and applying them to a one- 
dimensional map of an interval, one can generate sym- 
bolic sequences according to a vocabulary of partition. 



2 



It should be noted that previously a certain deter- 
ministic model was already proposed in 12]. The ob- 
served non-linear oscillations of the GC content and of 
the AT and GC skews were compared with the charac- 
teristic features of chaotic strange attractors. For the 
full eukaryotic chromosome DNA-sequences the correla- 
tions of GC-skews only with replicons and with the gene 
strand position (see [13[) are as yet not confirmed and 
show a more complex dependencet [9|. In particular, it 
is characteristic for the human genome to have long non- 
coding T-rich subsequences and small islands of A-rich 
and G-rich coding sequences. Human genes have frequent 
strand permutations and form islands with high level of 
occupation density. These blocks mainly consist of single 
genes or bidirectional transcription pairs Q , and compo- 
sition of DNA sequence in these regions has oscillations 
of high frequency while the major part of a genome is 
non-coding with gigantic (up to several megabases) low 
frequency intervals |9(. All these properties imply more 
complex dynamics than it was discussed in (l3| . 

Consider a map of the interval I = [0, 1] onto itself: 

x n+ i = f{x ni a) = x n + x® (mod 1), a > 1. (1) 

This map is known as a Manneville map [14} which 
has been introduced to explain the intermittency phe- 
nomenon in dissipative dynamical systems. As it is 
known (see, e.g. [HI), the map ([T]) exhibits three dy- 
namical regimes depending on the parameter a: for 
1 < a < 3/2 dynamics is normal in the sense that the 
fluctuations of the observable generated by the map tra- 
jectory are Gaussian; for 3/2 < a < 2 dynamics is tran- 
siently anomalous; and for a > 2 the dynamics is (Levy) 
anomalous with the Levy index a = l/(a — 1). 

The Manneville map has been used to describe DNA 
sequences as a composition of deterministic regimes with 
long-range correlations and chaotic processes. However, 
the developed approach was limited to small DNA se- 
quences, corresponding, in biological terms, to introns 
and exons 16]. 

To develop the symbolic presentation for a one- 
dimensional map, let us subdivide the interval [0, 1] into 
subintervals [0; 1/8; 1/4; 3/4; 1]. Then, the symbolic par- 
tition AGTC corresponds to these subintervals. In this 
case the sequences generated by the map ([T]) will have 
long regular parts and chaotic domains, similar to some 
small DNA sequences in 2D DNA walk. However it was 
shown |l7| . that this model demonstrates a very narrow 
diversity of generated sequences. 

To model the real total sequence of nucleotides in chro- 
mosomes, let us consider a modified Manneville map with 
two accumulating marginal points (traps) and a compli- 
cated partition of the interval based on the triplets of 
nucleotides (codons): 

x n+ i = x n + bx% (§ - x n ) (1 - x n ) a (mod 1). (2) 

In the vicinity of the points x = and x = 1 the map 
has properties of the original Manneville map (see FigfT]). 



Therefore, at certain parameter values it exhibits the in- 
termittency property with two 'laminar' regions around 
points and 1. 
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FIG. 1: Plot of the map with intermittency @ with two 
marginal points. 
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FIG. 2: (a) 2D DNA walk of sequence of chromosome 4 of 
Drosophila; (b) 2D walk generated by the map ([2]). 

Symbolic sequence constructed from A, G, T and G 
may be obtained as follows. Let us divide the interval 
[0; 1] into 64 equal parts. Each domain of the partition 
uniquely corresponds to a single triplet (si, S2, S3), where 
Si G A, G, T, G. Note that the behavior of the phase 
trajectory and, as a consequence, the trajectory of 2D 
DNA walk in this model are controlled by two numerical 
parameters and the order of the codons forms a partition 
of the interval [0,1]. By varying these parameters one 
can significantly change the behavior of the walk. This 
property of the system allows us to obtain various types 
of 2D DNA walk trajectories, qualitatively reflecting the 
large-scale characteristics of the real chromosomes. 

At the first step let us show that this map allows us to 
model subsequences with repeats. The comparison of re- 
peat tracks for the real chromosome and the chromosome 
simulated by the expression ^ are shown in FigJ5] a,b, 
respectively. In terms of symbolic dynamics, for repeat 
simulation the phase trajectory should go deep into the 
'trap' and stay there long enough to generate correspond- 
ing symbols (codons). Long stretches of tri-nucleotide re- 
peats are a common phenomenon in the real chromosome 
DNA sequences. Their origin may be of different nature, 
but the most common explanation is that of a slippage 
during the replication process due to the formation of 
stem-loop secondary structures [lH- In our model such 
a process is defined by the existence of a long-distance 
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pattern in a sequence, which represents a path of the 
phase trajectory deep into the trap. In this model, re- 
peat patterns can also be reproduced in the regime of 
3N-nucleototide repeats. 

To provide for the 'trap' regime, it is sufficient to in- 
crease the exponent a in the map ([2]). Then if, in the pro- 
cess of iteration, the phase trajectory comes close enough 
to the traps (point or 1), it will stay in their neigh- 
borhood for a long time by producing thereby the same 
codon. 




1 
FIG. 3: Plot of the map ((3} with type-I intermittency. 




FIG. 4: (Color online) (a) 2D DNA walk of sequence of du- 
plicated palindrome in human chromosome X (~300Kbs); (b) 
2D walk generated by the map ([3|. Symmetrical parts of the 
palindrome are marked by red and blue colors. 

If one needs to obtain more complex repetitive pat- 
terns, one may divide the interval [0, 1] not into 64, but 
into a several-fold higher number of subintervals, e.g. 
2 • 64 • n, where n is an integer. To produce the inverted 
patterns it is enough to introduce a symmetry to the 
specified partition in such a way that the inverted com- 
plementary codons are positioned symmetrically with re- 
spect to the center of the interval [0,1]. This idea stems 
from the fact that a real chromosome has two comple- 
mentary DNA strands and, as we have shown in Q, the 
gene numbers on two strands are equal for the majority 
of chromosomes. 

In this case we relate DNA strands to 'traps' with the 
most widely used codons being positioned in the center 
of such 'traps'. In terms of this model, gigantic inter- 
vals (pseudo-palindromes) with low frequency changes in 
composition can also be interpreted as a result of pseudo- 
genes sequence drift, which can contribute to the forma- 
tion of non-coding regions. Importantly, here the trap, 
which is close to 0, is out of equilibrium, while the trap 



close to 1 is a real trap. This property can provide dif- 
ferent codon usage statistics that can qualitatively re- 
produce a strand-specific mutation pressure in real full 
chromosome sequences 19!]. 

Then in the process of the escaping from traps and 
1, the phase trajectory gives rise to statistics of inverse- 
complementary short patterns, but not the real long 
inverse-complementary sequences, such as palindromes. 

In order to obtain a palindromic sequence for the phase 
trajectory of the map it is necessary to go through com- 
plementary codons in the opposite directions immedi- 
ately after the direct pass of either spacer segments. This 
can be achieved when the trajectory is not only out of 
the traps, as discussed above, but also enters into them 
at the same rate as the rate of escape. This means that 
the trap should not be a fixed point. Such a configura- 
tion can be obtained with the model which shows type-I 
intermittency [2(| that arises at the destruction of the 
fixed point through a tangent bifurcation. 



x n+ i = x n + (x n - \) c + e (mod 1) 



(3) 



In a tangent bifurcation there is a narrow region (tunnel) 
wherein the trajectory slowly enters and also slowly es- 
capes (Figj3)). If codons forming the symbolic partition 
are positioned complementarily with respect to a hypo- 
thetical center of this area, then, by passing through the 
tunnel, the trajectory will create palindromic sequences. 

A real duplicated palindrome about 300 kb from hu- 
man chromosome X is shown in Figf?^. By adjusting 
the width and profile of the tunnel in the map ([3]) it is 
possible to attain a sufficiently wide variety of generated 
walks with the inverse-complementary type of sequences 
(see Figgb). 
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FIG. 5: (a) 2D DNA walk of large part of human chromo- 
some X (~30Mbs); (b) 2D walk generated by system of two 
coupled maps Q, d > 0; (c) 2D DNA walk of chromosome 9 
of Trypanosoma brucei; (d) 2D walk generated by the system 
Q without coupling (d = 0). 
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As it was shown @ , the 2D DNA walk of real chromo- 
somes can be transformed into a symmetrical trajectory 
by sorting the genes on strands, and thus, full chromo- 
some sequences contain a hidden symmetry. Based on 
the discovered symmetrical properties of chromosome se- 
quences, we suggest that in the proposed model the 2D 
DNA walk of chromosomes can be generally divided into 
two types of dynamic processes. 

One of the processes is associated with the pres- 
ence of regions noted for their large-scale inverse- 
complementary symmetry. Such regions may arise as a 
consequence of various mechanisms, including inverse in- 
trachromosomal duplication [U [22[, subtelomeres and 
telomeres repeats, transposon insertions, recombination- 
caused strand switches of genes with close compositional 
structure, strand-specific mutational pressure, etc. The 
second process reflects the fact that in full-length chro- 
mosomes there is a balance in the number of genes on 
the strands [8j, and thus this process represents a statis- 
tically confirmed hidden symmetry. As discussed above, 
the first of the processes can be effectively modeled by the 
map with type-I intermittency, which is characterized by 
the presence of a narrow 'tunnel' that produces the global 
compositional symmetry (Figj3j). The second process is 
associated with gene strand switches and can be quali- 
tatively described by the symmetric Pomeau-Manneville 
map with two 'traps' as shown in FigfT] 

It is possible to combine these two processes into one 
two-dimensional model. The dynamics in one direction 
is mainly determined by the map with a 'tunnel' and the 
behavior in the second direction is governed by the map 
with 'traps'. In general, these two dynamics are mutu- 
ally dependent. This means that these maps should be 
coupled with a system with parametric control of inter- 
dependence of two dimensions. The symbolic dynamics 
for the constructed two-dimensional map can be formed 
by a partition of the two-dimensional space, which is 
composed of the direct product of partitions of one- 



dimensional spaces. The system of two maps, coupled 
by discrete diffusion, can be written as follows: 

x n +i = x n + bx a n (| - x n ) (1 - x n ) a + d(y n - x n ), 

Un+l = Vn +Vn+ £ + d ( X n ~ Vn), 

(4) 

where x and y are defined modulo 1. FigJS^, and Figj5jD 
show a qualitative similarity between real sequences of 
the human chromosome X and those generated by the 
coupled maps (QJ. Even in the absence of coupling be- 
tween the maps a trajectory of 2D walk has a complicated 
form. In this case the dynamics of the system is only a 
direct product of the dynamics of two one-dimensional 
maps. The 2D DNA walk of the Trypanosoma brucei 
chromosome 9 and the 2D walk of the sequence generated 
by the model (01 without coupling are shown in FigJSJ: 
and FigJSJi, respectively. Obviously, the presence of the 
diffusion coupling between the two mentioned processes 
may increase the complexity of the global landscape of 
this sequence. 

In conclusion, the results reported in this Letter show 
that deterministic dynamics with intermittency can re- 
produce complex properties of real eukaryotic chromo- 
some DNA sequences including the global inverted sym- 
metry of subtelomeres and various types of large-scale 
polymorphisms. The gene number balance on DNA 
strands for the most eukaryotic complete chromosome 
sequences (see Q), which in this model are the result of 
the time balance of the phase trajectory in symmetrical 
traps, may naturally coexist with large-scale polymor- 
phisms. Based on the proposed model we suppose that 
an increase of sequence complexity via the value of the 
diffusion coupling d between subsystems ^ , represented 
in the dynamics of type-I and type-II intermittency, may 
reflect the evolutionary distance between certain system 
properties of real genomes. 
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